corresponding author
Abstract:Accurate 3D semantic occupancy perception is essential for autonomous driving in complex environments with diverse and irregular objects. While vision-centric methods suffer from geometric inaccuracies, LiDAR-based approaches often lack rich semantic information. To address these limitations, MS-Occ, a novel multi-stage LiDAR-camera fusion framework which includes middle-stage fusion and late-stage fusion, is proposed, integrating LiDAR's geometric fidelity with camera-based semantic richness via hierarchical cross-modal fusion. The framework introduces innovations at two critical stages: (1) In the middle-stage feature fusion, the Gaussian-Geo module leverages Gaussian kernel rendering on sparse LiDAR depth maps to enhance 2D image features with dense geometric priors, and the Semantic-Aware module enriches LiDAR voxels with semantic context via deformable cross-attention; (2) In the late-stage voxel fusion, the Adaptive Fusion (AF) module dynamically balances voxel features across modalities, while the High Classification Confidence Voxel Fusion (HCCVF) module resolves semantic inconsistencies using self-attention-based refinement. Experiments on the nuScenes-OpenOccupancy benchmark show that MS-Occ achieves an Intersection over Union (IoU) of 32.1% and a mean IoU (mIoU) of 25.3%, surpassing the state-of-the-art by +0.7% IoU and +2.4% mIoU. Ablation studies further validate the contribution of each module, with substantial improvements in small-object perception, demonstrating the practical value of MS-Occ for safety-critical autonomous driving scenarios.
Abstract:The remote sensing image change detection task is an essential method for large-scale monitoring. We propose HSANet, a network that uses hierarchical convolution to extract multi-scale features. It incorporates hybrid self-attention and cross-attention mechanisms to learn and fuse global and cross-scale information. This enables HSANet to capture global context at different scales and integrate cross-scale features, refining edge details and improving detection performance. We will also open-source our model code: https://github.com/ChengxiHAN/HSANet.
Abstract:Orthogonal time frequency space (OTFS) modulation is anticipated to be a promising candidate for supporting integrated sensing and communications (ISAC) systems, which is considered as a pivotal technique for realizing next generation wireless networks. In this paper, we develop a minimum bit error rate (BER) precoder design for an OTFS-based ISAC system. In particular, the BER minimization problem takes into account the maximum available transmission power budget and the required sensing performance. Different from prior studies that considered ISAC in the time-frequency (TF) domain, we devise the precoder from the perspective of the delay-Doppler (DD) domain by exploiting the equivalent DD domain channel due to the fact that the DD domain channel generally tends to be sparse and quasi-static, which can facilitate a low-overhead ISAC system design. To address the non-convex optimization design problem, we resort to optimizing the lower bound of the derived average BER by adopting Jensen's inequality. Subsequently, the formulated problem is decoupled into two independent sub-problems via singular value decomposition (SVD) methodology. We then theoretically analyze the feasibility conditions of the proposed problem and present a low-complexity iterative solution via leveraging the Lagrangian duality approach. Simulation results verify the effectiveness of our proposed precoder compared to the benchmark schemes and reveal the interplay between sensing and communication for dual-functional precoder design, indicating a trade-off where transmission efficiency is sacrificed for increasing transmission reliability and sensing accuracy.
Abstract:In this paper, we investigate the performance of the cross-domain iterative detection (CDID) framework with orthogonal time frequency space (OTFS) modulation, where two distinct CDID algorithms are presented. The proposed schemes estimate/detect the information symbols iteratively across the frequency domain and the delay-Doppler (DD) domain via passing either the a posteriori or extrinsic information. Building upon this framework, we investigate the error performance by considering the bias evolution and state evolution. Furthermore, we discuss their error performance in convergence and the DD domain error state lower bounds in each iteration. Specifically, we demonstrate that in convergence, the ultimate error performance of the CDID passing the a posteriori information can be characterized by two potential convergence points. In contrast, the ultimate error performance of the CDID passing the extrinsic information has only one convergence point, which, interestingly, aligns with the matched filter bound. Our numerical results confirm our analytical findings and unveil the promising error performance achieved by the proposed designs.
Abstract:Multiple access is the cornerstone technology for each generation of wireless cellular networks and resource allocation design plays a crucial role in multiple access. In this paper, we present a comprehensive tutorial overview for junior researchers in this field, aiming to offer a foundational guide for resource allocation design in the context of next-generation multiple access (NGMA). Initially, we identify three types of channels in future wireless cellular networks over which NGMA will be implemented, namely: natural channels, reconfigurable channels, and functional channels. Natural channels are traditional uplink and downlink communication channels; reconfigurable channels are defined as channels that can be proactively reshaped via emerging platforms or techniques, such as intelligent reflecting surface (IRS), unmanned aerial vehicle (UAV), and movable/fluid antenna (M/FA); and functional channels support not only communication but also other functionalities simultaneously, with typical examples including integrated sensing and communication (ISAC) and joint computing and communication (JCAC) channels. Then, we introduce NGMA models applicable to these three types of channels that cover most of the practical communication scenarios of future wireless communications. Subsequently, we articulate the key optimization technical challenges inherent in the resource allocation design for NGMA, categorizing them into rate-oriented, power-oriented, and reliability-oriented resource allocation designs. The corresponding optimization approaches for solving the formulated resource allocation design problems are then presented. Finally, simulation results are presented and discussed to elucidate the practical implications and insights derived from resource allocation designs in NGMA.
Abstract:Phytoplankton, a crucial component of aquatic ecosystems, requires efficient monitoring to understand marine ecological processes and environmental conditions. Traditional phytoplankton monitoring methods, relying on non-in situ observations, are time-consuming and resource-intensive, limiting timely analysis. To address these limitations, we introduce PhyTracker, an intelligent in situ tracking framework designed for automatic tracking of phytoplankton. PhyTracker overcomes significant challenges unique to phytoplankton monitoring, such as constrained mobility within water flow, inconspicuous appearance, and the presence of impurities. Our method incorporates three innovative modules: a Texture-enhanced Feature Extraction (TFE) module, an Attention-enhanced Temporal Association (ATA) module, and a Flow-agnostic Movement Refinement (FMR) module. These modules enhance feature capture, differentiate between phytoplankton and impurities, and refine movement characteristics, respectively. Extensive experiments on the PMOT dataset validate the superiority of PhyTracker in phytoplankton tracking, and additional tests on the MOT dataset demonstrate its general applicability, outperforming conventional tracking methods. This work highlights key differences between phytoplankton and traditional objects, offering an effective solution for phytoplankton monitoring.
Abstract:This paper proposes a joint active and passive beamforming design for reconfigurable intelligent surface (RIS)-aided wireless communication systems, adopting a piece-wise near-field channel model. While a traditional near-field channel model, applied without any approximations, offers higher modeling accuracy than a far-field model, it renders the system design more sensitive to channel estimation errors (CEEs). As a remedy, we propose to adopt a piece-wise near-field channel model that leverages the advantages of the near-field approach while enhancing its robustness against CEEs. Our study analyzes the impact of different channel models, including the traditional near-field, the proposed piece-wise near-field and far-field channel models, on the interference distribution caused by CEEs and model mismatches. Subsequently, by treating the interference as noise, we formulate a joint active and passive beamforming design problem to maximize the spectral efficiency (SE). The formulated problem is then recast as a mean squared error (MSE) minimization problem and a suboptimal algorithm is developed to iteratively update the active and passive beamforming strategies. Simulation results demonstrate that adopting the piece-wise near-field channel model leads to an improved SE compared to both the near-field and far-field models in the presence of CEEs. Furthermore, the proposed piece-wise near-field model achieves a good trade-off between modeling accuracy and system's degrees of freedom (DoF).
Abstract:Integrated sensing and communication (ISAC) is envisioned as a key pillar for enabling the upcoming sixth generation (6G) communication systems, requiring not only reliable communication functionalities but also highly accurate environmental sensing capabilities. In this paper, we design a novel networked ISAC framework to explore the collaboration among multiple users for environmental sensing. Specifically, multiple users can serve as powerful sensors, capturing back scattered signals from a target at various angles to facilitate reliable computational imaging. Centralized sensing approaches are extremely sensitive to the capability of the leader node because it requires the leader node to process the signals sent by all the users. To this end, we propose a two-step distributed cooperative sensing algorithm that allows low-dimensional intermediate estimate exchange among neighboring users, thus eliminating the reliance on the centralized leader node and improving the robustness of sensing. This way, multiple users can cooperatively sense a target by exploiting the block-wise environment sparsity and the interference cancellation technique. Furthermore, we analyze the mean square error of the proposed distributed algorithm as a networked sensing performance metric and propose a beamforming design for the proposed network ISAC scheme to maximize the networked sensing accuracy and communication performance subject to a transmit power constraint. Simulation results validate the effectiveness of the proposed algorithm compared with the state-of-the-art algorithms.
Abstract:The recently proposed orthogonal time frequency space (OTFS) modulation, which is a typical Delay-Doppler (DD) communication scheme, has attracted significant attention thanks to its appealing performance over doubly-selective channels. In this paper, we present the fundamentals of general DD communications from the viewpoint of the Zak transform. We start our study by constructing DD domain basis functions aligning with the time-frequency (TF)-consistency condition, which are globally quasi-periodic and locally twisted-shifted. We unveil that these features are translated to unique signal structures in both time and frequency, which are beneficial for communication purposes. Then, we focus on the practical implementations of DD Nyquist communications, where we show that rectangular windows achieve perfect DD orthogonality, while truncated periodic signals can obtain sufficient DD orthogonality. Particularly, smoothed rectangular window with excess bandwidth can result in a slightly worse orthogonality but better pulse localization in the DD domain. Furthermore, we present a practical pulse shaping framework for general DD communications and derive the corresponding input-output relation under various shaping pulses. Our numerical results agree with our derivations and also demonstrate advantages of DD communications over conventional orthogonal frequency-division multiplexing (OFDM).
Abstract:Self-supervised contrastive learning strategy has attracted remarkable attention due to its exceptional ability in representation learning. However, current contrastive learning tends to learn global coarse-grained representations of the image that benefit generic object recognition, whereas such coarse-grained features are insufficient for fine-grained visual recognition. In this paper, we present to incorporate the subtle local fine-grained feature learning into global self-supervised contrastive learning through a pure self-supervised global-local fine-grained contrastive learning framework. Specifically, a novel pretext task called Local Discrimination (LoDisc) is proposed to explicitly supervise self-supervised model's focus towards local pivotal regions which are captured by a simple-but-effective location-wise mask sampling strategy. We show that Local Discrimination pretext task can effectively enhance fine-grained clues in important local regions, and the global-local framework further refines the fine-grained feature representations of images. Extensive experimental results on different fine-grained object recognition tasks demonstrate that the proposed method can lead to a decent improvement in different evaluation settings. Meanwhile, the proposed method is also effective in general object recognition tasks.